dual variable
- South America > Peru > Lima Department > Lima Province > Lima (0.04)
- North America > United States > Virginia (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- North America > Canada (0.14)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (31 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.68)
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes
Sharma et al. (2022) provide Y ang et al. (2022a) integrate Local SGDA with stochastic gradient estimators to eliminate the More recently, Zhang et al. (2023) adopt compressed momentum methods with Local SGD to increase the communication efficiency of the algorithm. For centralized nonconvex minimax problems, Y ang et al. (2022b) show that, even in deterministic settings, GDA-based methods necessitate the timescale separation of the stepsizes for primal and dual updates.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.14)
- Asia > China (0.04)
- (3 more...)
- Information Technology (0.45)
- Government (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual variable via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.